Discussion of [ Grefenstette , 1993 ] for CS 388 HW 4

نویسنده

  • Jay Liu
چکیده

1 Auto-generated Thesaurus Thesaurus is not only a useful resource to help understand related words and phrases in a document , it can also catalog correlation between words across multiple domains that were previously not obvious. For example, the word " mouse " in a document about animals is related to " cat " and " rabbit " , while in an article from a different domain it may relate with the words " printer " and " keyboard " [Dorow and Widdows, 2003]. Both concepts should appear in a thesaurus. Creating a thesaurus therefore requires a high degree of knowledge across many domains. Doing so manually requires domain experts and therefore can be expensive, yet incomplete. [Grefenstette, 1993] explores the possibility of automatically generating thesauri with domain-independent algorithm. Domain independence is what is being referred to as " knowledge-poor ". Given a large source of documents that contain sentences about some topic, the idea is to comb through this data and find terms that may relate to each other. In [Grefenstette, 1993], four specific strategies to mine related concepts were investigated. Before proceeding, the words were syntactically analyzed. They were put through a part of speech tagger and labeled as the appropriate part of speech. Their morphological deviations were standardized to reduce the number of different words. Each word now becomes a feature on its own. Words that are modifiers for other words (such as adjectives and verbs for nouns) were considered to be additional attributes for that word feature. • Related nouns Using the attributes as additional information about the noun words, each word pair can be assigned a similarity value. Words with attributes that are highly similar implies they were often used in similar ways, and therefore there is a high probability the two words are related. For additional robustness, two words are only considered related if they are among the top-N number of similar words for each other. • Related verbs Similarly for verbs, the attributes are in turn the nouns and phrases that are subjects or objects of the verbs. Verbs with similar attributes are related. • Noun phrases Noun phrases are similar to nouns, except that a larger number of adjacent words are used as attributes. This was found to produce more relevant attributes than a more stringent, smaller scope. • Morphology discovery Creative morphological development varied in form depending on the domain. A …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Intermittent Warming and Prestorage Treatments (Hot Water, Salicylic Acid, Calcium Chloride) on Postharvest Life of Pomegranate Fruit cv. ‘Shishe-Kab’ during Long-Term Cold Storage

We examined the effectiveness of intermittent warming (IW), hot water (HW), salicylic acid (SA), and calcium chloride (CaCl2) on the postharvest life of pomegranate fruit cv. ‘Shishe-Kab’ and extending fruit shelf life during cold storage (3°C). Fruit were subjected to cycles of 1 d at 17 ± 1°C every 6 d of storage at 3°C under IW conditions. Pre-storage treatments were HW (50°C) for 3 min, SA ...

متن کامل

C 2 . 4 Rank - based selection

Rank-based selection assigns a reproductive or survival probability to each individual that depends only on the rank ordering of the individuals in the current population. The section presents a brief discussion of ranking, including linear, nonlinear, (μ, λ), and (μ+ λ) methods. The theory of rank-based selection is briefly outlined, including a discussion of implicit parallelism and character...

متن کامل

Effect of Fixing Agent Dosage on the Mechanism of Colloidal Substances Retention onto Pulp

Three polyamine fixing agents with increasing molecular weights (m.w.), PA-Lw, PA-Mw, and PA-Hw, were used to treat a deinked pulp at three different levels of chemical dosage. The objective was to elucidate whether the retention mechanism of colloidal substances (CS) onto fibers by a fixing agent is different when the dosage is different. The results show that, for the polyamine with the lowes...

متن کامل

بررسی اندیکاسیون ها و نتایج و عوارض عمل سزارین در بیمارستان آرش به مدت یکسال، 73-1372

Cesarean section (CS) is a relatively safe procedure, performed for different maiernal and fetal indications. Despite complications of general anesthesia, post-operative infections, and thromboembolic events, CS is being performed with increasing frequency. In this study, we have examined the indications and complications of CS's performed in Arash Hospital from December 1993 to November 1994. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008